Interactive Visualization for Topic Model Curation

نویسندگان

  • Guoray Cai
  • Feng Sun
  • Yongzhong Sha
چکیده

Understanding the content of a large text corpus can be assisted by topic modeling methods, but the discovered topics often do not make clear sense to human analysts. Interactive topic modeling addresses such problems by allowing a human to steer the topic model curation process (generate, interpret, diagnose, and refine). However, human have limited ability to work with the artifacts of computational topic models since they are difficult to interpret and harvest. This paper explores the nature of such challenges and provides a visual analytic solution in the context of supporting political scientists to understand the thematic content of online petition data. We use interactive topic modeling of the White House online petition data as a lens to bring up key points of discussions and to highlight the unsolved problems as well as potentials utilities of visual analytics methods. ACM Classification

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessing the Preservation Condition of Large and Heterogeneous Electronic Records Collections with Visualization

As collections become larger in size, more complex in structure and increasingly diverse in composition, new approaches are needed to help curators assess digital files and make decisions about their long-term preservation. We present research on the use of interactive visualization to analyze file characterization information for the purpose of assessing the preservation condition of a vast co...

متن کامل

GoMapMan: integration, consolidation and visualization of plant gene annotations within the MapMan ontology

GoMapMan (http://www.gomapman.org) is an open web-accessible resource for gene functional annotations in the plant sciences. It was developed to facilitate improvement, consolidation and visualization of gene annotations across several plant species. GoMapMan is based on the MapMan ontology, organized in the form of a hierarchical tree of biological concepts, which describe gene functions. Curr...

متن کامل

Xenbase: a genomic, epigenomic and transcriptomic model organism database

Xenbase (www.xenbase.org) is an online resource for researchers utilizing Xenopus laevis and Xenopus tropicalis, and for biomedical scientists seeking access to data generated with these model systems. Content is aggregated from a variety of external resources and also generated by in-house curation of scientific literature and bioinformatic analyses. Over the past two years many new types of c...

متن کامل

Hiérarchie: Interactive Visualization for Hierarchical Topic Models

Existing algorithms for understanding large collections of documents often produce output that is nearly as difficult and time consuming to interpret as reading each of the documents themselves. Topic modeling is a text understanding algorithm that discovers the “topics” or themes within a collection of documents. Tools based on topic modeling become increasingly complex as the number of topics...

متن کامل

Concurrent Visualization of Relationships between Words and Topics in Topic Models

Analysis tools based on topic models are often used as a means to explore large amounts of unstructured data. Users often reason about the correctness of a model using relationships between words within the topics or topics within the model. We compute this useful contextual information as term co-occurrence and topic covariance and overlay it on top of standard topic model output via an intuit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018